Skip or minimise Apache log phase

Skip or minimise Apache log phase

am 08.10.2008 14:32:26 von aw

Hi.

Following a message posted on the Apache users list, I am just curious
if via mod_perl there could be a solution to the following issue :

A busy Apache server (with several VirtualHosts, why not ?) is being
accessed from internal network clients (IP address 192.168.*) as well as
by external clients (any other IPs).
Among the internal clients are some GoogleBots, which generate thousands
of accesses, all logged in the access logs of the hosts. These accesses
come to generate more than 90% of the total, which really bothers the
sysadmins when they have to scan any logfile for something else.

Would there be any way, using mod_perl, to detect such accesses early,
and to either cancel the log phase for them, or else redirect the
logging to some sink file, or else at least set some parameter so that
the verbosity of the log for these accesses would be drastically reduced ?
(Of course the requests themselves should just go through and still be
handled properly)

Looking at the description of the PerlLogHandler at
http://perl.apache.org/docs/2.0/user/handlers/http.html#Perl LogHandler
I find thhe following paragraph, which tends to indicate that Apache log
handlers will run anyway, but maybe there is still a devious solution ?

quote
First the handler tries to figure out what username the request is
issued for, if it fails to match the URI, it simply returns
Apache2::Const::DECLINED, letting other log handlers to do the logging.
Though it could return Apache2::Const::OK since all other log handlers
will be run anyway.
unquote

Thanks

Re: Skip or minimise Apache log phase

am 08.10.2008 15:25:44 von Vegard Vesterheim

On Wed, 08 Oct 2008 14:32:26 +0200 André Warnier wrote:

> Hi.
>
> Following a message posted on the Apache users list, I am just curious
> if via mod_perl there could be a solution to the following issue :
>
> A busy Apache server (with several VirtualHosts, why not ?) is being
> accessed from internal network clients (IP address 192.168.*) as well
> as by external clients (any other IPs).
> Among the internal clients are some GoogleBots, which generate
> thousands of accesses, all logged in the access logs of the hosts.
> These accesses come to generate more than 90% of the total, which
> really bothers the sysadmins when they have to scan any logfile for
> something else.
>
> Would there be any way, using mod_perl, to detect such accesses early,
> and to either cancel the log phase for them, or else redirect the
> logging to some sink file, or else at least set some parameter so that
> the verbosity of the log for these accesses would be drastically
> reduced ?

You can do this without involving mod_perl. Simply use one of the
directives SetEnvIf or BrowserMatch to detect these bots and set a
variable like 'is_googlebot', and then use a conditional log-statement:

CustomLog .... env=3D!is_googlebot

See also:=20
http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html
http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#cus tomlog

- Vegard V -

Re: Skip or minimise Apache log phase

am 08.10.2008 15:56:05 von aw

Vegard Vesterheim wrote:
>
> CustomLog .... env=!is_googlebot
>
> See also:
> http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html
> http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#cus tomlog
>
> - Vegard V -
>

That looks just too easy to be true..
But I believe you of course, and Thanks.

It's just that I have seen this same kind of question pop up regularly
on forums, and I don't remember ever seeing the simple suggestion above.
....
As a matter of fact, the original poster on the other forum just got the
same answer from someone else.

I do remain interested in a possible mod_perl solution though, because I
already have some add-on Apache/perl handler modules where such a
functionality would come in handy sometimes.

I have a suspicion that I might have to look at the ServerRec or
something like that, and make the change in the configuration on-the-fly...

Re: Skip or minimise Apache log phase

am 08.10.2008 18:25:05 von Graham TerMarsch

On Wednesday 08 October 2008, André Warnier wrote:
> Vegard Vesterheim wrote:
> > CustomLog .... env=3D!is_googlebot
> >
> > See also:
> > http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html
> > http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#cus tomlog
> >
> > - Vegard V -
>
> That looks just too easy to be true..
> But I believe you of course, and Thanks.

Is that easy... I've used that type of configuration many times, setting up=
=20
ENV vars for various things and then logging them to separate files.

Most common thing I've used it for is to ignore logging of requests for=20
images.

=2D-=20
Graham TerMarsch
Howling Frog Internet Development, Inc.

Re: Skip or minimise Apache log phase

am 08.10.2008 18:52:46 von Fred Moyer

André Warnier wrote:
> Vegard Vesterheim wrote:
>>
>> CustomLog .... env=!is_googlebot
>>
>> See also: http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html
>> http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#cus tomlog
>>
>> - Vegard V -
>>
>
> That looks just too easy to be true..
> But I believe you of course, and Thanks.
>
> It's just that I have seen this same kind of question pop up regularly
> on forums, and I don't remember ever seeing the simple suggestion above.
> ...
> As a matter of fact, the original poster on the other forum just got the
> same answer from someone else.
>
> I do remain interested in a possible mod_perl solution though, because I
> already have some add-on Apache/perl handler modules where such a
> functionality would come in handy sometimes.

You could use a transhandler to set your own loghandler based on the
request url.

$r->set_handlers( PerlResponseHandler => [ 'My::PerlLogHandler' ] )
if $should_skip_apache_loghandler;

And then write your own PerlLogHandler which returns Apache2::Const::OK
(which should tell Apache to skip the log handler phase).

package My::PerlLogHandler;

use strict;
use warnings;

use Apache2::Const -compile => qw( OK );

sub handler;
my $r = shift;
return Apache2::Const::OK;
}

>
> I have a suspicion that I might have to look at the ServerRec or
> something like that, and make the change in the configuration on-the-fly...

Re: Skip or minimise Apache log phase

am 08.10.2008 23:05:23 von Fred Moyer

André Warnier wrote:
> Fred Moyer wrote:
>> André Warnier wrote:
>
> Thanks for the answer and the above code.
> But what about the following ?
>
> http://perl.apache.org/docs/2.0/user/handlers/http.html#Perl LogHandler
> quote
> [...]
> it simply returns Apache2::Const::DECLINED, letting other log handlers
> to do the logging. Though it could return Apache2::Const::OK since all
> other log handlers will be run anyway.

Ah right, PerlLogHandler is RUN_ALL - my bad. I remember wanting to do
something similar a while ago and ended up just marking those log
entries so that I could 'grep -v MY_IGNORE' the log and not have to deal
with them.

> unquote
>
> as compared to your
>
> And then write your own PerlLogHandler which returns Apache2::Const::OK
> (which should tell Apache to skip the log handler phase).
>
> ?
>
> Of course I can try it, but if you already know..
>

Re: Skip or minimise Apache log phase

am 09.10.2008 08:30:11 von Hendrik Van Belleghem

------=_Part_51651_29126822.1223533811790
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hi Andr=E9,
If you're looking into the mod_perl approach, you might want to have a peek
at Apache::LogIgnore. I developed the module for Apache v1 and haven't
tested it on v2 but it might be a good place to start.

http://search.cpan.org/~beatnik/Apache-LogIgnore-0.03/

HTH

Hendrik

2008/10/8 Andr=E9 Warnier

> Hi.
>
> Following a message posted on the Apache users list, I am just curious if
> via mod_perl there could be a solution to the following issue :
>
> A busy Apache server (with several VirtualHosts, why not ?) is being
> accessed from internal network clients (IP address 192.168.*) as well as =
by
> external clients (any other IPs).
> Among the internal clients are some GoogleBots, which generate thousands =
of
> accesses, all logged in the access logs of the hosts. These accesses com=
e
> to generate more than 90% of the total, which really bothers the sysadmin=
s
> when they have to scan any logfile for something else.
>
> Would there be any way, using mod_perl, to detect such accesses early, an=
d
> to either cancel the log phase for them, or else redirect the logging to
> some sink file, or else at least set some parameter so that the verbosity=
of
> the log for these accesses would be drastically reduced ?
> (Of course the requests themselves should just go through and still be
> handled properly)
>
> Looking at the description of the PerlLogHandler at
> http://perl.apache.org/docs/2.0/user/handlers/http.html#Perl LogHandler
> I find thhe following paragraph, which tends to indicate that Apache log
> handlers will run anyway, but maybe there is still a devious solution ?
>
> quote
> First the handler tries to figure out what username the request is issued
> for, if it fails to match the URI, it simply returns
> Apache2::Const::DECLINED, letting other log handlers to do the logging.
> Though it could return Apache2::Const::OK since all other log handlers wi=
ll
> be run anyway.
> unquote
>
> Thanks
>



--=20
Hendrik Van Belleghem
Spine - The backbone for your website - http://spine.sf.net

------=_Part_51651_29126822.1223533811790
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hi Andr=E9,

If you're looking into the mod_perl approach, you might want =
to have a peek at Apache::LogIgnore. I developed the module for Apache=
v1 and haven't tested it on v2 but it might be a good p=
lace to start. 


 

htt=
p://search.cpan.org/~beatnik/Apache-LogIgnore-0.03/
  


HTH

 

Hendrik

 

2008/10/8 Andr=E9 Warnier <=
>


px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">Hi.

Following a message p=
osted on the Apache users list, I am just curious if via mod_perl there cou=
ld be a solution to the following issue :


A busy Apache server (with several VirtualHosts, why not ?) is being ac=
cessed from internal network clients (IP address 192.168.*) as well as by e=
xternal clients (any other IPs).
Among the internal clients are some Goo=
gleBots, which generate thousands of accesses, all logged in the access log=
s of the hosts.  These accesses come to generate more than 90% of the =
total, which really bothers the sysadmins when they have to scan any logfil=
e for something else.


Would there be any way, using mod_perl, to detect such accesses early, =
and to either cancel the log phase for them, or else redirect the logging t=
o some sink file, or else at least set some parameter so that the verbosity=
of the log for these accesses would be drastically reduced ?

(Of course the requests themselves should just go through and still be hand=
led properly)

Looking at the description of the PerlLogHandler at > Handler" target=3D"_blank">http://perl.apache.org/docs/2.0/user/handl ers/ht=
tp.html#PerlLogHandler


I find thhe following paragraph, which tends to indicate that Apache log ha=
ndlers will run anyway, but maybe there is still a devious solution ?
r>quote
First the handler tries to figure out what username the request =
is issued for, if it fails to match the URI, it simply returns Apache2::Con=
st::DECLINED, letting other log handlers to do the logging. Though it could=
return Apache2::Const::OK since all other log handlers will be run anyway.=


unquote

Thanks



-- r>Hendrik Van Belleghem
Spine - The backbone for your website - =3D"http://spine.sf.net">http://spine.sf.net


------=_Part_51651_29126822.1223533811790--

Re: Skip or minimise Apache log phase

am 09.10.2008 09:09:22 von aw

Hendrik Van Belleghem wrote:
> Hi André,
> If you're looking into the mod_perl approach, you might want to have a peek
> at Apache::LogIgnore. I developed the module for Apache v1 and haven't
> tested it on v2 but it might be a good place to start.
>
> http://search.cpan.org/~beatnik/Apache-LogIgnore-0.03/
>
Many thanks.
I just had a look at the module.
From what I understand, it returns DONE whenever a condition matches,
the idea being probably then to skip other logging steps.
But according to the mod_perl 2.x docs quoted in previous messages, that
would still let the other log handlers run.
But, maybe the mp2 docs are wrong ?
I guess I'll have to test that under Apache2/mp2.